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Abstract 

This paper explores the preference-based top-A rank aggregation problem. Suppose that a 
collection of items is repeatedly compared in pairs, and one wishes to recover a consistent 
ordering that emphasizes the top-A ranked items, based on partially revealed preferences. 
For concreteness, this work focuses on the popular Bradley-Terry-Luce model that pos¬ 
tulates a set of latent preference scores underlying all items, where the odds of paired 
comparisons depend only on the relative scores of the items involved. 

We characterize the minimax limits on the identifiability of top-A ranked items, in 
the presence of random and non-adaptive sampling. Our findings highlight a separation 
measure that quantifies the gap of preference scores between the A*^ and (A -|- ranked 
items. In an attempt to approach this minimax limit, we propose a nearly linear-time 
ranking scheme, called Spectral MLE, that returns the indices of the top-A items in accor¬ 
dance to a careful score estimate. In a nutshell. Spectral MLE starts with an initial score 
estimate with minimal squared loss (obtained via a spectral method), and then successively 
refines each component with the assistance of coordinate-wise MLEs. Encouragingly, Spec¬ 
tral MLE achieves perfect top-A item identification with minimal sample complexity. The 
practical applicability of Spectral MLE is further corroborated by numerical experiments. 
Keywords: Bradley-Terry-Luce model, top-A ranking, linear-time algorithm, minimax 
limits, coordinate-wise MLE 


1. Introduction and Motivation 


The task of rank aggregation is encountered in a wide spectrum 

of contexts like social 
rch and information re- 

choice ( 
trieval 

Caplin and Nalebufl. 

1991: Soufiani et ah. 

2014bll. web sea: 

Brin and Pagel. 1998: 

Dwork et ah. 20011. crowd sourcing ( 

Chen et ah. 

20131. rec- 


information over a collection of items, the aim is to identify a consistent ordering that best 
respects the revealed preference. In the high-dimensional regime, one is often faced with 
two challenges: (1) the number of items to be ranked is ever growing, which makes it 
increasingly harder to recover a consistent total ordering over all items; (2) the observed 


1 































Chen and Suh 


data is highly incomplete and inconsistent: only a small number of noisy pairwise / listwise 
preferences can be acquired. 


In an effort to address such challenges, this paper explores a popular pairwise preference- 
based model, which postulates the existence of a ground-truth ranking. Specifically, consider 
a parametric model involving n items, each assigned a preference score that determines its 
rank. Concrete examples of preference scores include the overall rating of an athlete, the 
academic performance and competitiveness of a university, the dining quality of a restaurant, 
etc. Each item is then repeatedly compared against a few others in pairs, yielding a set 
of noisy binary comparisons generated based on the relative preference scores. In many 
situations, the number of repeated comparisons essentially reflects the signal-to-noise ratio 
(SNR) or the quality of the information revealed for each pair of items. The goal is then 
to develop a “denoising” procedure that recovers the ground-truth ranking, ideally from a 
minimal number of noisy observations. 

There has been a proliferation of ranking schemes that suggest partial solutions. While 
the ranking that we are seeking is better treated as a function of the preference parameters, 
most of the aforementioned schemes adopt the natural “plug-in” procedure, that is, start 
by inferring the preference scores, and then return a ranking in accordance to the paramet¬ 
ric esti mates. The most popular paradigm is arguably the maximum likelihood estimation 
(MLE) (Ford, 19571 ). where the main appeal of MLE is its inhere nt convexity under several 


comparison models, e.g. the Bradley-Terry-Luce (BTL) model ( Bradley and Terry . 19521 : 


Lucel . I 959 I ). Encouragingly, MLE often achieves low £2 estimation loss while retaining efh 


cient finite-sample complexity. Anoth er prominent alternati ve concerns a family of spectral 
ranking algorithms (e.g. PageRank (IBrin and Paed. 


within this family is Rank Centrality (iNegahban et al 


1998lB . 

mi), 


A provably efficient choice 
which produces an estimate 
with nearly minimax mean squared error (MSE). While both MLE and Rank Centrality 
allow intriguing guarantees towards finding faithful parametric estimates, the squared loss 
metric considered therein does not necessarily imply optimality of the ranking accuracy. In 
fact, there is no shortage of high-dimensional situations that admit parametric estimates 
with low squared loss while precluding reliable ranking. Furthermore, many realistic sce¬ 
narios emphasize only a few items that receive the highest ranks. Unfortunately, the above 
MSE results fall short of ensuring recovery of the top-ranked items. 


In this work, we consider accurate identification of top-iL ranked items under the popular 
BTL pairwise comparison model, assuming that the item pairs we can compare are selected 
in a random and non-adaptive fashion (termed passive ranking). In particular, we aim to 
explore the following questions: (a) what is the minimum number of repeated comparisons 
necessary for reliable ranking? (b) how is the ranking accuracy affected by the underlying 
preference score distributions? We will address these two questions from both statistical 
and algorithmic perspectives. 


1.1 Main Contributions 

This paper investigates minimax optimal procedures for top-iL rank aggregation. Our 
contributions are two-fold. 

To begin with, we characterize the fundamental three-way tradeoff between the number 
of repeated comparisons, the sparsity of the comparison graph, and the preference score 
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distribution, from a minimax perspective. In particular, we emphasize a separation measure 
that quantifies the gap of preference scores between the and {K + 1)*'^ ranked items. 
Our results demonstrate that the minimal sample complexity or quality of paired evaluation 
(reflected by the number of repeated comparisons per an observed pair) scales inversely with 
the separation measure at a quadratic rate. 

Algorithmically, we propose a nearly linear-time two-stage procedure, called Spectral 
MLE, which allows perfect top-A' identification as soon as the sample complexity exceeds 
the minimax limits (modulo some constant). Specifically, Spectral MLE starts by obtaining 
careful initial scores that are faithful in the I 2 sense (e.g. via a spectral ranking method), 
and then iteratively sharpens the pointwise estimates via a careful comparison between the 
running iterates and the coordinate-wise MLEs. This algorithm is designed primarily in an 
attempt to seek a score estimate with minimal pointwise loss which, in turn, guarantees 
optimal ranking accuracy. Eurthermore, numerical experiments demonstrate that Spec¬ 
tral MLE outperforms Rank Centrality by achieving lower estimation error and higher 
ranking accuracy. 


1.2 Prior Art 


There are two distinct families of observation models that have received considerable in¬ 
terest: (1) value-based model, where the observation on each item is drawn only from the 
distribution underlying this individual; (2) preference-based model, where one observes the 
relative order among a few items instead of revealing their individual values. Best-AT identi¬ 
fication in the value-based model with adaptive sampling (termed active ranking) is closely 
related to the multi-arm ed bandit problem, where the fundainental identification coinplexit y 


has been characterized ( Gabillon et ah . 2011 : Bubeck et ah . 20131 : Jamieson et ah . 2014). 


In addition, the value-based and preference-based models h ave been compared in terms of 


minimax error rates in estimating the latent quantities; see (IShah et al.l . l2014l L 


In the realm of pairwise preference settings, many active ranking schemes (|Busa-Eekete and Hiillermeieri . 


2 OI 4 I ) have been proposed in an attempt to optimize the exploration -exploitation trade 


off. For instance, in the noise-free case, (jJamieson and Nowald . 120111 ) considered perfect 


total ranking and characterized the query complexity gain of adaptive sampling relative 


clidean embedding. Furthermore, several works (lAilonl. 

2012; 

Jamieson and Nowak. 

2011: 

Braverman and Mossel. 

20081: 

Wauthier et ah. 

2013 !^ explored the querv complexity in the 


presence of noise, but were basically designed to recover “approximately correct” total 
rankings—a solution with loss at most a factor (1 -|- e) from optimal—rather than accurate 
ordering. Another path-based app roach has been p roposed to accommodate accurate top-AT 
queries from noisy pairwise data ( Eriksson . I2OI3I ). where the observation error is assumed 
to be i.i.d. in stead of being itein - dependent. Motivated by the succes s of va, lue-based racing 
algorithms, ( Biisa-Fekete et ah . I 2 OI 3 I : iBiisa-Fekete and Hiillermeieri . 120141 ) came up with 
a generalized racing algorithm that often led to efficient sample complexity. In contrast, 
the current paper concentrates on top-AT identification in a passive setting, assuming that 
partial preferences are collected in a noisy, random, and non-adaptive manner. This was 
previously out of reach. 
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_ Apart from Rank Centrality and MLE, the most relevant work is ([Raikumar and Aearwal 

20141 ) ■ For a variety of rank aggregation methods, they developed intriguing sufficient statis¬ 


tical hypotheses that guarantee the convergence to an optimal ranking, which in turn leads 
to sample complexity bounds for Rank Centrality and MLE. Nevertheless, they focused on 
perfect total ordering instead of top-iL selection, and their results fall short of a rigorous 
justification as to whether or not the derived sample complexity bounds are statistically 
optimal. 

Finally, there are many related yet different problem set tings considered in the prior 
literature. For instance, the work (jAmmar and Shahl . 120121 ) approached top-iL ranking 
using a maximum entropy principle, assuming the e xistence of a distribution /i over all 
possible permutations. Recent work (ISoufiani et al.l . 120131 . l2014al ) investigated consistent 
rank breaking under more generalized models involving full rankings. A family of dis- 


(Farnoud and Milenkovic. 

2014 

). Another line of works considered the popular distance- 

based Mallows model ( 

Lu and Boutilier. 2011: Busa-Fekete et ah. 2014 

: Awasthi et al.|. 20141 

An online ranking setting has been studied as well ( 

Harrineton. 20031: 

Farnoud et ah. 20141. 


More broadly, the minimax recovery limits un der general pairwise measurements have re¬ 


cently been determined by (|Chen et al 
work. 


2015bl L These are beyond the scope of the present 


1.3 Organization and Notation 

The remainder of the paper is organized as follows. Section [2] introduces the pairwise 
comparison model as well as the key performance metrics for the top-AT ranking task. 
The main results, including a fundamental minimax lower limit and an achievability result 
by nearly linear-time algorithms, are summarized and discussed in Section [3l Section 0] 
presents the detailed procedure and performance guarantees of the proposed Spectral MLE 
algorithm, and provides a heuristic treatment as to why it is expected to control the loo 
estimation error. We conclude the paper with a summary of our findings and a discussion 
of about future research directions in Section [5j The proofs of the ranking performance of 
Spectral MLE (i.e. Theorem [7]) and the minimax lower bound (i.e. Theorem [3]) are deferred 
to Appendix lAl and Appendix |Bl respectively. 

Before continuing, we provide a brief summary of the notations used throughout the 
paper. Let [n] represent {1,2,-•• ,n}. We denote by ||m||, ||m||i, ||m||cxD the (.2 norm, 
norm, and loo norm of w, respectively. A graph Q is said to be an Erdos-Renyi random 
graph, denoted by Gn,pahs-: each pair (i,j) is connected by an edge independently with 
probability Pobs- Besides, we use deg (z) to represent the degree of vertex i m. Q. 

Additionally, for any two sequences /(n) and g{n), f{n) > g{n) or /(n) = VL{g{n)) mean 
that there exists a constant c such that f{n) > cg{n)‘, f{n) < g{n) or /(n) = 0{g{n)) mean 
that there exists a constant c such that /(n) < cg{n)‘, and /(n) x g{n) or /(n) = Q{g{n)) 
mean that there exist constants ci and C 2 such that ci 5 '(n) < /(n) < C 2 g{n). 


2. Problem Setup 

To formalize matters, we present mathematical setups and key performance metrics in this 
section. 
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Comparison Model and Assumptions. Suppose that we observe a few pairwise eval¬ 
uations over n items. To pursue a statistical understanding towards the ranking limits, 
we ass ume that the pairwise comparison ou tcomes are generated according to the BTL 
model (jBradlev and Terrvl. Il952l: iLucd. Il959li. a long-st anding model that has been applied 
in numerous applications (jAgrestil . l2014l : iHunterl . 120041 ) . 


• Preference Scores. The BTL model hypothesizes on the existence of some hidden prefer¬ 
ence vector w = where Wi represents the underlying preference score / weight 

of item i. The outcome of each paired comparison depends only on the scores of the items 
involved. Without loss of generality, we will assume throughout that 


Wi > W2 > ■ ■ ■ > Wn > 0 


( 1 ) 


unless otherwise specified. 

• Comparison Graph. Denote hy Q = {[n],£) the comparison graph such that items i and 
j are compared if and only if (i, j) belongs to the edge set £. We will mostly assume that 
Q is drawn from the Erdos-Renyi model Q ~ Qn,pahs some observation ratio Pobs- 

• (Repeated) Pairwise Comparisons. For each (i,j) G £, we observe L independent paired 
comparisons between items i and j. The outcome of the comparison between them, 
denoted by yfj, is generated as per the BTL model: 



with probability 
else, 


( 2 ) 


where = 1 indicates a win by i over j. We adopt the convention that = 1 — yf]- R 

is assumed throughout that conditional on Q, the yfj’s are jointly independent across all 
I and i > j. For ease of presentation, we introduce the collection of sufficient statistics as 

1 ^ 

Vi ■= {Vi,j I J : ihj) € T} ; Vi,j Vi}- (3) 

^ l=i 


• Signal to Noise Ratio (SNR) / Quality of Comparisons. The overall faithfulness of the 
acquired evaluation between items i and j is captured by the sufficient statistic ytj. Its 
SNR can be captured by 


SNR : 


Var [yij] 


( 4 ) 


As a result, the number L of repeated comparisons measures the SNR or the quality of 
comparisons over any observed pair of items. 


• Dynamic Range of Preference Scores. It is assumed throughout that the dynamic range 
of the preference scores is fixed irrespective of n, namely. 


Wi G [rCmin) ^max] j 1 < 2 < 71 


( 5 ) 
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for some positive constant s Wmin and Wmay boun ded away from 0, which amounts to the 


most challenging regime (jNegahban et alJ . 120121 ') . In fact, the case in which the range 
/'W^min grows with n can be readily translated into the above fixed-range regime by 


w 


first separating o ut those items with vani shing scores (e.g. via a simple voting method 
like Borda count ( Ammar and Shahl . boilh l. 


Performance Metric. Given these pairwise observations, one wishes to see whether or 
not the top-K ranked items are identifiable. To this end, we consider the probability of 
error Pg in isolating the set of top-A ranked items, i.e. 

PeW:=¥{i;iy)^[K]], ( 6 ) 


where ip is any ranking scheme that returns a set of K indices. Here, [K] denotes the 
(unordered) set of the first K indices. We aim to characterize the fundamental admissible 
region of (L,pobs) where reliable top-A ranking is feasible, i.e. Pg can be vanishingly small 
as n grows. 


3. Minimax Ranking Limits 

We explore the fundamental ranking limits from a minimax perspective, which centers on 
the design of robust ranking schemes that guard against the worst case in probability of 
error. The most challenging component of top-A rank aggregation often hinges upon dis¬ 
tinguishing the two items near the decision boundary, i.e. the A*^ and (A -|- 1)**^ ranked 
items. Due to the random nature of the acquired finite-bit comparisons, the information 
concerning their relative preference could be obliterated by noise, unless their latent prefer¬ 
ence scores are sufficiently separated. In light of this, we single out a preference separation 
measure as follows 


Ax:= 


WK - WK+1 


(7) 


As will be seen, this measure plays a crucial role in determining information integrity for 
top-A identification. 

To model random sampling and partial obs ervations, w e employ the Erdos-Renyi ran¬ 
dom graph Q ~ ^n,pobs- As already noted by (Ford, 1957li . if the comparison graph Q is 
not connected, then there is absolutely no basis to determine relative preferences between 
two disconnected components. Therefore, a reasonable necessary condition that one would 
expect is the connectivity of Q, which requires 


log n 
Pohs P • 

n 

All results presented in this paper will operate under this assumption. 

A main finding of this paper is an order-wise tight sufficient condition for top-A iden- 
tifiability, as stated in the theorem below. 


Theorem 1 (Identifiability) Suppose that Q ~ ^n,pobs with p^hs > cologn/n. Assume 
that L = O (poly (n)) and rcmax/w'min = ©(I)- With probability exceeding 1 — 
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the set of top-K ranked items can be identified exactly by an algorithm that runs in time 
0 (1^1 log' n), provided that 


C 2 log n 
'^Pohs^'x 


(9) 


Here, co,ci,C 2 >0 are some universal constants. 


Remark 2 We assume throughout that the input fed to each ranking algorithm is the suf¬ 
ficient statistic {yij \ {i,j) G rather than the entire collection of yfj, otherwise even 
reading all data takes at least O (L • l^l) flops / time. 

Theorem [1] characterizes an identifiable region within which exact identification of top-iC 
items is plausible by nearly linear-time algorithms. The algorithm we propose, as detailed in 
Section 01 attempts recovery by computing a score estimate whose errors can be uniformly 
controlled across all entries. Afterwards, the algorithm reports the K items that receive the 
highest estimated scores. 

Encouragingly, the above identifiable region is minimax optimal. Consider a given sepa¬ 
ration condition Ax, and suppose that nature behaves in an adversarial manner by choosing 
the worst-case scores w compatible with A^. This imposes a minimax lower bound on the 
quality of comparisons necessary for reliable ranking, as given below. 

Theorem 3 (Minimax Lower Bounds) Fix e G (O, and let Q ~ Qn,pahs- V 

(1-e) logn-2 
npobsA^ 

holds for some constant c > 0, then for any ranking scheme there exists a preference 
vector w with separation Ax such that (if) > e. 



Theorem [3] taken collectively with Theorem [T] determines the scaling of the fundamental 
ranking boundary on L. Since the sample size sharply concentrates around n^pohsL in our 
model, this implies that the required sample complexity for top-AT ranking scales inversely 
with the preference separation at a quadratic rate. Put another way, Theorem [3] justifies 
the need for a minimum separation criterion that applies to any ranking scheme; 

log n 

f^PdbsL 

Somewhat unexpectedly, there is no computational barrier away from this statistical limit 
(at least in an order-wise sense). Several other remarks of Theorem [1] and Theorem [3] are 
in order. 



( 11 ) 


(-2 Loss vs. Loss. A dominant fraction of prior methods foc us on the mean squared 
error in estimating the latent scores w. It was established by ( Negahban et ah . 2012l i 
that the minimax ^2 regret is squeezed between 


1 


y/npohsL ~ w 


. ^ E[||m-m||] 

< mf sup--—-- < 


\w\ 


I log re 

^Pohsd-' 


1. More precisely, one can take c = 
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where the infimum is taken over all score estimators w. This limit is almost identical 
to the minimax separation criterion m we derive for io^-K identification, except for 
a potential logarithmic factor. In fact, if the pointwise error of w is uniformly bounded 
by y^logre/(npobs.^); then it necessarily achieves the minimax £2 error. Moreover, the 
pointwise error bound presents a fundamental bottleneck for top-K ranking — it will 
be impossible to differentiate the and {K + 1)*^ ranked items unless their score 
separation exceeds the aggregate error of the corresponding score estimates for these 
two items. Based on this observation, our algorithm is mainly designed to control the 
elementwise estimation error. As will be seen in Section [H the resulting estimation error 
will be uniformly spread over all entries, which is optimal in both £2 and ioo sense. 


Prom Coarse to Detailed Ranking. The identifiable region we present depends only 
on the preference separation between items K and K + 1. This arises since we only 
intend to identify the group of top-A items without specifying the fine details within 
this group—we term it “coarse top-A ranking”. In fact, our results readily uncover the 
minimax separation requirements for the case where one further expects fine ordering 
among these A items. Specifically, this task is feasible—in the minimax sense—if and 
only if 


> 


‘ logn 

f^PohsL 


l<i< K. 


( 12 ) 


In words, the feasibility of detailed top-A ranking relies on sufficient score separation 
between any consecutive pair of the top-A ranked items. 

High SNR Requirement for Total Ordering. In many situations, the separation 
criterion (I12p immediately suggests the hardness (or even impossibility) of recovering the 
ordering over all items. In fact, to figure out the total order, one expects sufficient score 
separation between all pairs of consecutive items, namely, 


Ai > 

t 


I logn 

f^PohsL 


1 < i < n. 


Since the Afs are defined in a normalized way (0, they necessarily satisfy 


n—l 




Wi - Wn 
Wmax 


As can be easily verified, the preceding two conditions would be incompatible unless 

n log n 

^ 1 
Pohs 

which imposes a fairly stringent SNR requirement. For instance, under a sparse graph 
where Pohs ^ the number of repeated comparisons (and hence the SNR) needs to 
be at least on the order of n^, regardless of the method employed. Such a high SNR 
requirement could be increasingly more difficult to guarantee as n grows. 
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• Passive Ranking vs. Active Ranking. In our passive ranking model, the sample 
complexity requirement n^pobs-h for reliable top-A identification is given by 


n'^PohsL > 


nlogn 

A2 


In comparison, when adaptive sampling is employed for the preferen ce-based model, the 


most recent upper bound on the sample complexity (e.g. Theorem 1 of (|Busa-Fekete et al 
201, 'll )) is on the order of 


e:::4>os 


n. 


In the challenging regime where a dominant fraction of consecutive pairs are minimally 
separated (e.g. Ai = • • • = A„_i), the above results seem to suggest that active ranking 
may not outperform passive ranking, since the sample complexity reads (nlogn)/A^. For 
the other extreme case where only a single pair is minimally separated (e.g. Ai <C Aj 
(i > 2)), active ranking is more appealing, because it will adaptively acquire more paired 
evaluation over the minimally separated items instead of wasting samples on those pairs 
that are easy to differentiate. 


4. Ranking Scheme: Spectral Method Meets MLE 

This section presents a nearly linear-time algorithm that attempts recovery of the top- 
K ranked items. The algorithm proceeds in two stages: (1) an appropriate initialization 
that concentrates around the ground truth in an £2 sense, which can be obtained via a 
spectral ranking method; (2) a sequence of iterative updates sharpening the estimates in a 
pointwise manner, which consists in computing coordinate-wise MLE solutions. The two 
stages operate upon different sets of samples, while no further sample splitting is needed 
within each stage. The combination of these two stages will be referred to as Spectral MLE. 

Before continuing to describe the details of our algorithm, we introduce a few notations 
that will be used throughout. 

• C{w;y^): the likelihood function of a latent preference vector w, given the part of 
comparisons that have bearing on item i. 

• for any preference vector w, let wy represent [tci,--- , tCj-i, tCj+i, • • • ,Wn] ex¬ 
cluding Wi- 

• £ (r, with a slight abuse of notation, denote by C (T,w\p,yjfj the likelihood 

of the preference vector [tci, • • • , tCj-i, r, rcj+i, • • • , Wn]- 


4.1 Algorithm: Spectral MLE 

It has been established that the spectral ranking method, particularly Rank Centrality, is 
able to discover a preference vector w that incurs minimal (.2 loss. To enable reliable ranking, 
however, it is more desirable to obtain an estimate that is faithful in an elementwise sense. 
Fortunately, the solution returned by the spectral method will serve as an ideal initial guess 
to seed our algorithm. The two components of the proposed Spectral MLE are described 
below. 
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Algorithm 1 Spectral MLE. 

Input: The average comparison outcome j/jj for all {i,j) € T; the score range 

Partition £ randomly into two sets T™'* and each containing ^ \£\ edges. Denote by 
yinit the components of obtained over (resp. 

Initialize to be the estimate computed by Rank Centrality on (1 < i < n). 


Successive Refinement: for t = 0 : T do 

1) Compute the coordinate-wise MLE 

,.mle 


Wi 


arg max 

TeiuiminiW'max] 


£(r,mW; yjM 


■ 


2) For each 1 < z < n, set 


w. 


(Z-l-l) 


w. 


mle 


if 


w. 


it) 

i ’ 


ml© (^) 

^mie _ y 


> 6 , 


else. 


(13) 


(14) 


Output the indices of the K largest components of w^'^\ 


Algorithm 2 Rank Centrality (Negahban et al.,, 2012) 
Input: The average comparison outcome yij for all {i,j) G 


Compute the transition matrix P = [Pij]i<ij<n such that 

'j- yj^ if (z,j/-) G T^*®"; 

if i 7 ^ j and (z, j) ^ 




f^max yi,j+yj,i ’ 

0 , 


1 - 




Vk i 
yi,k+yk,i-‘ 


if z = j. 


where dmax is the maximum out-degrees of vertices in £ 


iter 


Output the stationary distribution of P 


1. Initialization via Spectral Ranking. We generate an initialization w 


( 0 ) 


via 


Rank 


Centrality. In words, Rank Centrality proceeds by constructing a Markov chain based on 
the pairwise observations, and then returning its stationary distribution by computing 
the leading eigenvector of the associated probability transition matrix. For the sake 
of completeness, we provide the detailed procedure of Rank Centrality in Algorithmic 
Under the Erdos-Renyi model, the estimate is kn own to be reasonably faithful in 


terms of the mean squared loss (iNegahban et al.l . 120121 1. that is, with high probability. 


\w 


( 0 ) _ 


w\ 


< 




I logn 

ttPohsL 


2. Successive Refinement via Coordi nate-wise MLE. Not e that the state-of-the-art 
finite-sample analyses for MLE (e.g. ( Negahban et ah . 20121 )) involve only the £2 ac¬ 
curacy of the global MLE when the locations of all samples are i.i.d. (rather than the 


10 



























Spectral MLE: Toy-K Rank Aggregation from Pairwise Comparisons 


graph-based model considered herein). Instead of seeking a global MLE solution, we 
propose to carefully utilize the coordinate-wise MLE. Specifically, we cyclically iterate 
through each component, one at a time, maximizing the log-likelihood function with 
respect to that component. In contrast to the coordinate-descent method for solving 
the global MLE, we replace the preceding estimate with the new coordinate-wise MLE 
only when they are far apart. Theorem [8] (to be stated in Section 4.2) guarantees the 
contraction of the pointwise error for each cycle, which leads to a geometric convergence 
rate. 


The algorithm then returns the indices of top-iL items in accordance to the score estimate. 
A formal and detailed description of the procedure is summarized in Algorithm [TJ 

Remark 4 We split £ into and for analytical convenience. Empirically, if we 
keep = £ and reuse all samples, then it seems to slightly outperform the sample¬ 

splitting proeedure. Thus, we recommend the sample-reusing procedure for praetieal use, and 
leave the theoretieal justifieation for future work. 


Remark 5 Spectral MLE is inspired by rece n t adv a nces in solving non-convex programs by 
mean s of iterative methods iKeshavan et a,l. . 20It . 200^ : .Jain et ai . 201,i: Candes et al. 


201 A: \Netravalli et ail \20lOi : \Balakrishnan et al\. \2 oTa) . A key message conveyed from 


these works is: once we arrive at an appropriate initialization (often via a spectral method), 
the iterative estimates will be rapidly attracted towards the global optimum. 


Remark 6 While our analysis is restricted to the Erdos-Renyi model. Spectral MLE is 
applicable to general graphs. We caution, however, that spectral ranking is not guaranteed to 
achieve minimal (.2 loss for general graphs and, in particular, the kind of graphs exhibiting 
small spectral gaps. Therefore, Spectral MLE is not necessarily minimax optimal under 
general graph patterns. 

Notably, the successive refinement stage is developed based on the observation that 
we are able to characterize the confidence intervals of the coordinate-wise MLEs at each 
iteration. The role of such confidence intervals is to help detect outlier components that 
incur large pointwise loss. Since the initial guess is optimal in an overall (.2 sense, a large 
fraction of its entries are already faithful relative to the ground truth. As a consequence, it 
suffices to disentangle the “sparse” set of outliers. 

One appealing feature of Spectral MLE is its low computational complexity. Recall that 
the initialization step by Rank Centrality can be solved for e accuracy—i.e. identifying an 
estimate w such that llu; — iu||/||i(;|| < e—within O (|£’| log time instances by means of a 
power method. In addition, for each component i, the coordinate-wise likelihood function 
involves a sum of deg (i) terms. Since finding the coordinate-wise MLE (1131) can be cast as 
a one-dimensional convex program, one can get e accuracy via a bisection method within 
O (deg (i) • log time. Therefore, each iteration cycle of the successive rehnement stage 
can be accomplished in time O (|£i| • log i). 

The following theorem establishes the ranking accuracy of Spectral MLE under the BTL 
model. 
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Theorem 7 Let cq , ci, C2, C 3 > 0 be some universal constants. Suppose that L = 0(poly(n)), 
the comparison graph Q ~ Qn^p^hs ^'obs > cologn/n, and the separation measure ^ 
satisfies 


Ak > Cl 


I logn 

^PdbsL 


(15) 


Then with probability exceeding 1 — 1/n^, Spectral MLE perfectly identifies the set of top-K 
ranked items, provided that the algorithmic parameters obey T > 02 log n and 


■= C 3 


?min + ^ n 



(16) 


where Cmin ■= 



Theorem [7] basically implies that the proposed algorithm succeeds in separating out 
the high-ranking objects with probability approaching one, as long as the preference score 
satisfies the separation condition 


A. > 

y npohsL 

Additionally, Theorem [7] asserts that the number of iteration cycles required in the sec¬ 
ond stage scales at most logarithmically, revealing that Spectral MLE achieves the desired 
ranking precision with nearly linear-time computational complexity. 


4.2 Successive Refinement: Convergence and Contraction of i^o Error 

In the sequel, we would like to provide some interpretation as to why we expect the pointwise 
error of the score estimates to be controllable. The argument is heuristic in nature, since we 
will assume for simplicity that each iteration employs a fresh set of samples y independent 
of the present estimate 

Denote by i* (r) the true log-likelihood function 

(t) := 2^ogC{T,w\i;y^) . (17) 

Straightforward calculation suggests that its expectation around Wi can be controlled through 
a locally strongly-concave function, due to the existence of a second-order lower bound 


E„ r (u,i) - r (r)] = 


E 


KL 


Wi 


Wi -I- Wj 


T + Wj 


> Ir — 


T-Wi\ npohs, 


(18) 


where KL(p || q) represents the KullbackLeibler (KL) divergence between Bernoulli(p) and 
Bernoulli(( 7 ). These calculations will be made precise in Appendix lA.ll (and in particular 
Eqn. (Il3l) and (14^ 1. 
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This measures the penalty when r deviates from the ground truth. Note, however, that 
we don’t have direct access to I* (•) since it relies on the latent scores w. To obtain a 
computable surrogate, we replace w with the present estimate resulting in the plug-in 
likelihood function 

h (t) := 2 log ^ • 

Fortunately, the surrogate loss incurred by employing £i (r) is locally stable in the sense 
that. 


E,, 


(t) - ii {Wi) - {t (r) - t (Wi)) 


< 


npobs \T -Wi\ 


\w — w\ 




(19) 


which will be made clear in Appendix lA.ll This essentially means that the surrogate loss 
£i (r) — £{ (wi) is a reasonably good approximation of the true loss £* (r) — £* (wi), as long 
as r (resp. w) is sufficiently close to Wi (resp. w). As a result, any candidate t ^ Wi 
will be viewed as less likely than and hence distinguishable from the ground truth Wi (i.e. 
l'(rPj) > £{t)) in the mean sense, provided that its deviation penalty (fTHll dominates the 
surrogate loss (jl9p . This would hold as long as the pointwise loss exceeds the normalized 
£2 loss: 


T -Wi 


> 


w — w 




Thus, our procedure is expected to be able to converge to a solution whose pointwise error 
is as low as the normalized £2 error of the initial guess. 

Encouragingly, the £00 estimation error not only converges, but converges at a geo¬ 
metric rate as well. This rapid convergence property does not rely on the “fresh-sample” 
assumption imposed in the above heuristic argument, as formally stated in the following 
theorem. 


Theorem 8 Suppose that Q ~ Gn,pahs Pobs > cologn/n for some large constant cq, 
and there exists a score vector G [w^min, ^^max]"’ independent of Q satisfying 


ih"’’ - Wi 

< 


1 < i < n; 

- w 

< 

6 . 



( 20 ) 

( 21 ) 


Then with probability at least 1 — cin for some constant ci > 0, the coordinate-wise MLE 

’■“^®:=arg max T (r, r/j) (22) 


Wi 


j'aJmax] 


satisfies 


Wi - Wi 


mle 


< 


20(6+4n)^max [ logn / logn I 

max < (!)-I- -f, \l— -rr (23) 


4 

W ■ 
^min 


npohs " V npohsL j 

simultaneously for all scores w G [uiminj u^max]"' obeying \wi — Wi\ < — wA, 1 < i < n 
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In the regime where L = 0(poly(n)) and >c 5 x \l L i Theorem [8] asserts 

that under appropriate conditions, the coordinate-wise MLE is expected to achieve a 
lower pointwise error than such that 


I Mcxd ~ 


\w 


(i) _ 


m log n II (f\ I 

1—n—^ H-^ - w\ 

Itull npobs 


(24) 


When the replacement threshold is chosen to be on the same order as one 

can detect outliers and drag down the elementwise estimation error at a rate 


\w 


(i+l) _ 


W\ 


< 




it) - 


m log re II (p) I 

1—n—^ -^ i 

l-u^ll repobs 


(25) 


One important feature is that the same collection of samples can be reused across all 
iterations at the successive refinement stage, provided that we can identify in each cycle 
another slightly looser estimate that is independent of the samples. Another property that 
will be made clear in the analysis is that the £2 estimation error obeys 


w 


it) _ 


w 




< 




( 0 ) _ 




< 




I log re 

repobsT 


which further gives 


\w 


(*+i) _ 


w\ 


< 


log re log re II I 


ttPohsL repobs 


(26) 


(27) 


We recognize that the non-negative recursive sequence {fn} obeying the recurrence equation 
fn = 0- + bfn-i (0 < 6 < 1) must converge to a poinlH f^o = When specialized to ([27)l . 
this fact implies that the output of Spectral MLE obeys 


— ml 


< 


I log re 

ttPohsLi 


as long as logre/(repobs) is sufficiently small and T is sufficiently large. This is minimally 
apart from the ground truth. 


4.3 Discussion 

Choice of Initialization. Careful readers will remark that the success of Spectral MLE can 
be guaranteed by a broader selection of initialization procedures beyond Rank Centrality. 
Indeed, Theorem [8] and subsequent analyses lead to the following assertion: as long as the 
initialization method is able to produce an initial estimate that is reasonably faithful 

in the £2 sense _ 

||m(0 )-m|| / log re 

II II ~ w r ’ 

\\w\\ y repobsT 

2. To see this, one can rewrite the recurrence inequality as fn — = 6(/n-i — y^), which gives /„ — = 

b'^ifo — y^)- When n tends to infinity, this gives fn — y^ = 0. 
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Figure 1: (a) Empirical ioo loss v.s. L; (b) £a 
identification (n = 100,200). 


loss v.s. Pobsi (c) Rate of success in top-R 


then Spectral MLE will converge to a pointwise optimal preference obeying 


\w 


(T) _ 


w\ 


< 


' logn 
^Pobs R 


Initialization via Global MLE. One would naturally wonder whether we can employ 
the global MLE (computed over y™'*) to seed the iterative refinement stage (applied over 
yitery In fact, the state -of-the-art analysis (with a different but order-wise equivalent model) 


( Negahban et al.L 2012l l asserts that the global MLE satisfies the desired I 2 property 
for at least two cases: (a) complete graphs, i.e. pobs = Ij and (b) Erdos-Renyi graphs with 
no repeated comparisons, i.e. L = 1. In these two cases, the proposed algorithm achieves 
minimal ^oo errors if we initialize it via the global MLE. 
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Nevertheless, whether the global MLE achieves minimal ^2 loss for other configurations 
{L,pohs) has not been established. The analytical bottleneck seems to stem from an under¬ 
lying bias-variance tradeoff when accounting for two successive randomness mechanisms: 
the random graph Q and the repeated comparisons generated over Q. In general, yfj’s are 
not jointly independent unless we condition on Q. In contrast, the above two special cases 
amount to two extreme situations: (a) the randomness of Q goes away when pobs = 1 ; (b) 
the condition L = 1 avoids rep eated sampling. Nevert heless, these two cases alone (as well 


as the model in Theorem 4 of (jNegahban et al.l . 120121 )) are not sufficient in characterizing 


the complete tradeoff between graph sparsity and the quality of the acquired comparisons. 


4.4 Numerical Experiments 

A series of synthetic experiments is conducted to demonstrate the practical applicability 
of Spectral MLE. The important implementation parameters in our approach is the choice 
of C 2 and C 3 given in Theorem [3 which specify T and respectively. In all numerical 
simulations performed here, we pick C 2 = 5 and C 3 = 1, and do not split samples. We focus 
on the case where n = 100 , where each reported result is calculated by averaging over 200 
Monte Carlo trials. 

We first examine the ^00 error of the score estimates. The latent scores are generated 
uniformly over [0.5,1]. For each (pobsjT), the paired comparisons are randomly generated 
as per the BTL model, and we perform score inference by means of both Rank Centrality 
and Spectral MLE. Fig. [T](a) (resp. Fig. [11(b)) illustrates the empirical tradeoff between 
the pointwise score estimation accuracy and the number L of repeated comparisons (resp. 
graph sparsity Pobs)- It can be seen from these plots that the proposed Spectral MLE 
outperforms Rank Centrality uniformly over all configurations, corroborating our theoretical 
results. Interestingly, the performance gain is the most significant under sparse graphs in 
the presence of low-resolution comparisons (i.e. when pobs and L are small). 

Next, we study the success rate of top-LC identification as the number n of items varies. 
We generate the latent scores randomly over [0.5,1], except that a separation Ak is imposed 
between items K and K+1. The results are shown in Fig. dKc) for the case where Pobs = 0.2, 
and L = 5. As can be seen, Spectral MLE achieves higher ranking accuracy compared to 
Rank Centrality for all these situations. Interestingly, the benefit of Spectral MLE relative 
to Rank Centrality is more apparent in the regime where the score separation is small. In 
addition, it seems that Rank Centrality is capable of achieving good ranking accuracy in 
the randomized model we simulate, and we leave the theoretical analysis for future work. 


5. Conclusion 

This paper investigates rank aggregation from pairwise data that emphasizes the top-iL 
items. We developed a nearly linear-time algorithm that performs as well as the best model 
aware paradigm, from a minimax perspective. The proposed algorithm returns the indices 
of the best-AT items in accordance to a carefully tuned preference score estimate, which is 
obtained by combining a spectral method and a coordinate-wise MLE. Our results uncover 
the identifiability limit of top-iL ranking, which is dictated by the preference separation 
between the and {K 1)*'^ items. 
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This paper comes with some limitations in developing tight sample complexity bounds 
under general graphs. The performances o f Spectral MLE under other sampling models 


are worth investigating (lOsting et ahl . 120151 ). In addition, it remains to characterize both 
statistical an d computational r anking limits for other choice models (e.g. the Plackett- 


Luce model ( Haiek et ah . 20141 )). It would also be interesting to consider the case where 
the p aired comparisons are drawn from a mixture of BTL models (e.g. (|Qh and Shahl . 


20141 )). as well as the collaborative ranking setting where one aggregates the item prefer¬ 


ences from a pool of different users in orde r to infer rankings for each individual user (e.g. 
( Lii and Negahban . I20l4 IPark et al.l . l201,5l P. 
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Appendix A. Performance Gnarantees for Spectral MLE 


In this section, we establish the theoretical guarantees of Spectral MLE in controlling the 
ranking accuracy and estimation errors, which are the subjects of Theorem [7] and The¬ 
orem [HI The proof of Theorem [7] relies heavily on the claim of Theorem [HI for this reason, 
we present the proofs of Theorem [7| and Theorem [ 8 ] in a reverse order. Before proceeding, 
we recall that the coordinate-wise log-likelihood of r is given by 


L 


logC {T,w\i;yi) := ^ Vi,j log 




T + Wj 


+ (1 


and we shall use (resp. to denote the vector w 

[r&i, • • • ,w)n]) excluding the entry Wi (resp. Wi). 




-yi,j)log' , „ , 

T + Wj 

= [wi,--- ,Wn] (resp. 


(29) 


w = 


A.l Proof of Theorem [ 8 ] 

To prove Theorem [HJ we need to demonstrate that for every r € [rrmin; 'it'max] that is suf¬ 
ficiently separated from Wi (or, more formally, |r — Wi\ > maxjj -|- , \J 

coordinate-wise likelihood satisfies 

logC{wi,w\i] Vi) >logC{T,w\-yi) (30) 

and, therefore, r cannot be the coordinate-wise MLE. 

To begin with, we provide a lemma (which will be proved later) that concerns (1300 for 
any single r that is well separated from Wi. 


Lemma 9 Fix any 7 > 3. 
obeying 

\wi — t\ >7 


Under the conditions of Theorem\^ for any r E [wmin,W ma x] 


w. 


wF 4 V npohs 




(31) 


one has 


^log£ {wi,w\i-,y^) - ^log£{T,w\i-,yi) > 


(32) 


with probability exceeding 1 — 6n this holds simultaneously for all wi E [rcmin, w^max]"' 
satisfying \wi — Wi\ < — Wi\, I < i < n. 


To establish Theorem [HI we still need to derive a uniform control over all r satisfying 
(j31|] . This will be accomplished via a standard covering argument. Specifically, for any small 
quantity e > 0 , we construct a set A 4 (called an e-cover) within the interval [rcmiiu'u^max] 
such that for any r E [tCmiiD ^ymax]) there exists an tq E Me obeying 

— 'TqI < e and [tq — Wi\ > |t — Wi\. (33) 

It is self-evident that one can produce such a cover A4 with cardinality -)- 1 . If we 

set 7 = 6 -|- in Lemma [9| (which obeys 7 = 0(1) since L = 0(poly(n))), taking the 
union bound over Me gives 

jlog £ {wi,w\i]yi)-jlog£ {to, w\i]y,) > ( 3 ^) 
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simultaneously over all tq G A 4 obeying 


\wi-m > 


6 + 5 ^) 


W„ 


\25 f ^logn 

max < — 0 H- 

4 V npohs 


20^ 


I logn 1 

npohsL j ’ 


g log L 

this occurs with probability at least 1 — 6 \Me \ n logn _ 

We proceed by bounding the difference between \ogC (r, and logC (tq, 

for any |t — tqI < e. To achieve this, we first recognize that the Lipschitz constant of 
log£ (r,(cf. w.r.t. r is bounded above by the following inequality: 


1 

L 


dlogC {T,w\i-,yi) 


dr 


Hi,me T + Wj 


(1 Vi,j) 


1 


T + Wj 


(a) 2 

< deg (i) • 


(b) 12npobs 


Wr, 


5 IPn 


where (a) follows since 

VH (- -- (1 - Vi,j) ^ 

\T T + Wj J T + Wj 


Vi,3 

T 


1 


T + Wj 


< 


Vi,3 


+ 


1 


T + Wj 


< 


Wr, 


and (b) holds since deg(i) < 2.4npobs with probability 1 — O (n as long as 

' ' ^Pohs 

sufficiently small. As a result, by picking 


IS 


e = 


a^max log n 

100 <i„ L 




logn 


^ 240^;^^^ npobs-^^ ’ 


(35) 


one can make sure that for any |r — tqI < e. 


7 log C (r, w\i-, y,) - ^ log £ (tq, w\i, y^) < 


L 


\\ogC {T,w\i]y,) <\\ogC {To,w\i-y^) + 


12 npobs 

5 'IPmin 

(36) 

"“^Lx logn 
100 r;E„„ L 

(37) 


111111 

In addition, with the above choice of e, the cardinality of the e-cover is bounded above by 

+ 1 < n^L 


< 

^max 

+ 1 = 

■240npobs£ 


e 


logn 


for any sufficiently large n. 

Putting (p4]l and (f37)l together suggests that for all r G [n>min, iL>max] sufficiently apart 
from the ground truth Wi, namely. 


Vr G [Wmin, Wma.^] : \t - Wi\ > 


6 _l_ logi \ 5 

o w logn 1 “^max 


WZ 


20 ,P^|. 

nPohs J V npohsL J 

(38) 
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one necessarily has 
1 

I 


Y log C{wi,w\i] Vi) - ^log£ 


= |^log£ ^ log £ (tq, | + | ^ log £ (tq, wy; - ^log£ {T,w\i]y,) 

> 0, (39) 


log L 


log L 


with probability at least 1 — 6 |A4| n~^~ — O {n~^) > 1 — 6n^Ln~^~ l°g" — 0{n~^) = 

1 — 0{n~‘^). Consequently, any r G [rcmiru'u^max] that obeys (|38|) cannot be the coordinate- 
wise MLE, which in turn justifies the claim (I23p of Theorem [8] (we present Theorem [8] using 
slightly looser constants for notational simplicity). 

Proof [of Lemma [9] We start by evaluating the true coordinate-wise likelihood gap 


logC{wi,w\i;yi) -logC{T,W\i;yi) 


(40) 


for any fixed t ^ Wi independent of y^. Here, y^ := {j/ij | i : (i,j) G £} is assumed to be 
generated under the BTL model parameterized by w, which clearly obeys 

E [viA = —V— = j f x2 - 

Wi + Wj L[wi + Wjy 

In order to calculate the mean of (j40p . we rewrite the likelihood function as 


1 


- log C{T,w\i;yi) = ^ jl/ij-log 


2/ijiog 


T + Wj 


+ (1 - Vij) log 


Wi 


T + Wj 


Wn 


+ 

j:ii,j}e£ 


T + Wj 


(41) 

(42) 


Taking expectation w.r.t. y^ using the form (I4ip reveals that 


E 


^ log £ {wi, w\f, yi) - Y log £ (r, y^ 


L 


E 

j+,j)&£ 


Wi 


E kl 


Wi + Wj 
Wi 


log 


Wi + Wj 


Wi+Wj 
T+Wj J 

T 

T + Wj 


+ 


Wi 


Wi + Wj 


log 


Wi+Wj 

Wj 

. T+Wj 


(43) 


where KL(p||g) stands fo r the KL div ergence of Bernoulli (g) from Bernoulli (p). Using 
Pinsker’s inequality (e.g. ( Yeung . 20081 . Theorem 2.33)), that is, KL(p||gf) > 2{p-qf, we 
arrive at the following lower bound 

2 


E 


^ logC{w+w\i,yy log £ (r, w\i;yj) 


L 


£2 E ( 


Wi 


Wi + Wj T + Wj 


= 2{Wi- Tf ^ 


w- 


j.{i,j)&£ 


{wi + Wj) (r -k WjY 


(44) 
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That being said, the true coordinate-wise likelihood of Wi strictly dominates that of r in 
the mean sense. 

Nevertheless, when running Spectral MLE, we do not have access to the ground truth 
scores what we actually compute is C{wi,w\^i]y^) (resp. C{T,w\i]y^)) rather than 
C{w;yj) (resp. C{T,w\f,y^)). Fortunately, such surrogate likelihoods are sufficiently close 
to the true coordinate-wise likelihoods, which we will show in the rest of the proof. For 
brevity, we shall denote respectively the heuristic and true log-likelihood functions by 

:= ^ log C(T,w\i;yi) 

■= i^ogC{T,w\i-,yi) 

whenever it is clear from context. Note that could depend on y^. 

As seen from for any candidate r G [uiminj ^^max], we can quantify the difference 
between ii (r) and i* (r) as 


h (r) - t (r) 


2/hilog 



+ 


T + Wj 


log 


Wi 


T + Wj 


• (46) 


As a consequence, the gap between the true loss i* (wi) — £* (r) and the surrogate loss 
ii (wi) - ii (r) is given by 


ii (wi) - ii (t) - (i* (wi) - t (r)) = ii (wi) - t {wi) - ii (t) - i* (t 




Wi 


Wi -t- Wj 
T + Wj 


- log 

- log 


Wi 


^Wi -I- Wj 

This gap thus relies on the function 


Wi + Wj 

^ + Wj 
Wi + Wj 


- log 


Wi 


T + Wj 


- log 


Wi 


T + Wj 


g{t)\= log 


T + t 

Wi + t 


-log 


T + Wj 
Wi + Wj 


t G [rCmin) ^max] ; 


which apparently obeys the following two properties: (i) g (wj) = 0; (ii) 


dg (t) 


1 

1 

dt 


T + t 

Wi+t 


r - Wi 


(wi + t){T + t) 4w+ 
Taken together these two properties demonstrate that 


,T — Wi\ 

+ ~ ) vt G [rCmin) ^maxj • 


|5(i)l < \9iwj)\ + \t - Wj\ ■ sup 


dg{t) 


dt 


< 


1 


4wi 


\t - Wi\\t - Wj\ , Vt G [rCminjI^max] 


(47) 

(48) 


(49) 
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Substitution into (|18]) gives 

ii (Wi) - ii (r) - {£* (Wi) - t (r)) 


< 


< 


1 


Aw. 


r - Wi 


E 


\Wj-Wj\ 


min 

1 


Awh 


T-Wi\ I 


wf - Wj 


(50) 


Notably, this is a deterministic inequality which holds for all Wj obeying \wj — Wj\ < 
— Wj\, 1 < j < n. A desired property of the upper bound (j50l) is that it is independent 
of Q and the data due to our assumption on w'^'°. 

We now move on to develop an upper bound on (|50p . From our assumptions on the 
initial estimate, we have 


||m — m||^ < < 5 ^ Hit’ll^ < 

Since Q and are statistically independent, this inequality immediately gives rise to the 
following two consequences: 


E 






wf^ — Wj 


,.ub 


,.ub 


= PohsllW-^" - w\\i < PohsVn\\w-^“ - w\ 
E ^Pobs'^max^J 


(51) 


and 


E 




j)g£ 


Wj^ - Wj 


Recall our assumption that maxj 
then with probability at least 1 — 2 n“^, 


W'j^ — Wj 


= Pobsll^'"’’ npobs'U^max'^^- (^2) 

< ^w'max- For any fixed 7 > 3, if pobs > 



IA3 



+ W 27 logn • E 

y^.?:(7.?)eg 

W'j^ — Wj 

2 



J:{i,j)e£ 


V 



-* 


+ -^'6w^max log n 

/- 27 

< npobs'U^max'5 + V 27 ' npohs log nWraa.x5 + —^W^ax log U 

(ii) 27 

< (1 + npobsWmaxf^ + max logn 

(iii) 

< 7^Pobsl(^max'5 + max logn. 


where (i) comes from the Bernstein inequality as given in Lemma fTTl (ii) follows since 
logn < |)y assumption, and (iii) arises since 1 + yPy < 7 whenever 7 > 3. This 

combined with (j5nh allows us to control 


£i (wi) - ii (r) - {£* (wi) - t (r)) 


< 


\T -Wi\ 7n;max 

Aw"^ ■ 

^min 


(npobsi^ + i log n) 


(53) 
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with high probability. 

The above arguments basically reveal that ti (wi) —£i (r) is reasonably close to i* {wi) — 
i* (r). Thus, to show that ii (wi) — ii (r) > 0, it is sufficient to develop a lower bound on 
£* (wi) —£* (r) that exceeds the gap (f53]l . In expectation, the preceding inequality (Hif) gives 

.,,.2 

E [t (wi) - t (r) \g] > 2{wi- rf ^ 

.,2 


w. 


{wi + Wjf (r + WjY 


> 




8wt: 


{wi - t) deg {i) . 


(54) 


Recognizing that yij = ufj is a sum of independent random variables yfj 


Bernoulli ( 


^ qYir ) ’ control the conditional variance as 


(a) 


Var [r (rci) - r (r) \g] = Var 

1 WiWj 





" 

(-) 

g 

\ T / 



= log 


Wi 


E 


^ ^ (wi + Wj) 


2 — 


(b) 1 {wi - tY 


L min jr/;?, 


E 


ip; 


■ 

j-.{i,j)e£ 


< 


Y 

"'max 


dwim L 


- {wi - r) deg (z), 


(55) 


where (a) is an immediate consequence of (1421) . and (b) follows since 




< Lji for any 


Y > a > 0. Note that 0 < Yufj — T’ hence each summand of i* {wi) — £* (r) (written 


in terms of a weighted sum of yf^j) is bounded in magnitude by 


1 (0 
max -yl 


log 


Wi 


< 


L 


log 


Wi 


< 


1 \wi - t\ 


T W^min 
§. 


(56) 


log(§ 


< for any Y > a > 


where the last inequality follows again from the inequality 
0. Making use of the Bernstein inequality together with (j54p - (l56p suggests that; conditional 

on g, 


t {wi)-t (r) 


> E [t (wi) — t (t) I g] — a/ 27 Var [I* {wi) — £* (r) | g] log n — 


27 log n |log(^)| 


L 


w • 

^ mm 


8wL 


(wi - tY deg (i) - 




y/^Wraa.yi\wi - t\ /deg(i)logn 27 |r(;i - r| logn 


L 


SLWr, 


(57) 


holds with probability at least 1 — 2n~'^. The above bound relies on deg(z), which is 
on the order of npnhs with high probability. More precisely, taking the Chernoff bound 
( Mitzenmacher and Uofall . l2005l . Corollary 4.6) as well as the union bound reveals that: if 
is sufficiently large, then 


-^npobs < deg(z) < -npobs, 
5 5 


1 < i < n 


(58) 
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with probability at least 1 — 2n This taken collectively with (|57p and the assumption 
npohs > 2 log n implies that 


t (r) 

> {wi - rf ■ ^npobs - 


w„ 


7 'M^max \wi-T\ j Gnpobs log u 27 \wi - t\ log u 


5L 


SLWr^ 


w ■ 

^ mm 


lOtcl, 


> »^min 


{wi - T)^npobs - 


37 27 1 \ rcmax |tCj - r I / repobs log n 


3 y/2 


w„ 


, x2 / npobs log n 

(wi - r) npobs - 7- 




L 


{wi - t) npohs 


L 


(59) 

(60) 


with probability at least 1 — 4n as long as 


7 • 


Wma.^\wi - t\ /npobslogn 2 

< (wi - r) npobs 




L 




or, equivalently, 


\wi - r > 


207 • wj 

<in 


I logn 

^Pobs-b 


(61) 


Finally, we are ready to control £i (wi) — ii (r) from below. Putting (l53)) and (1601) 
together, we see that with high probability. 


£i {wi) - £i (r) > £* (wi) - £* (r) - 

,2 


\T-Wi\ {npobsS + ^ log u) 


> 


> 


> 


w„ 


20 tU^ax 


wz. 


lOOtcl. 


{wi - t) npobs - 


{wi - r)^ npobs 


■ 

mm 

|r - WilyWmax 


(npobs^ + i log n) 


wz 


logn 


where 


lOOu;^, L ’ 
holds under the condition 

257 m 


(62) 

(63) 


r - Wi\ > 


5 

max 




<5 + 


^ logn 
'^Pohs 


(64) 


and (j63l) follows from the assumption (I 6 ip . The claim 
conditions (f 6 T]l and (f 6 l|) . 


is then established under the 
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A.2 Proof of Theorem [3 


The accuracy of top-A identification is closely related to the foo error of the score estimate. 
In the sequel, we shall assume that rcmax = 1 to simplify presentation. Our goal is to 
demonstrate that 


— w 


< 


llogn 1 / logn _ ^ 

npohsL pobsL 


where 


?min + (^max ■^min) ^ ; Vf > 1 


(65) 


( 66 ) 


with ^min = \l np^^L ^ ^ ^2 log 77 . for some sufficiently large C 2 > 0 , 

then this gives 


— w 


< 

r\j 


I logn 

'^PohsLi 


= ^n 


The key implication is the following: if wk — wk-i > ci\/ for some sufficiently large 

V '^Pohs-^ 

Cl > 0 , then 


(T) (T) - 

wl — Wj > Wi — Wj — 


> WK- WK+l - 2 


(T) 


(T) 

Ulj — Wi 


Wj — Wj 


— w 


> 0 


for all 1 < i < A and j > A + 1, indicating that Spectral MLE will output the first A 
items as desired. The remaining proof then comes down to showing (j65p . 

We start from t = 0. When the initial estimate is computed by Rank Centrality, 
the £2 estimation error satisfies (jNegahban et al.l . l2012l i 


\w 


( 0 ) _ 


w\ 


fm 


< C44 


' logn 

'^PohsLi 


— C4^min :— 6 


(67) 


with high probability, where C 4 > 0 is some universal constant independent of n,pohs, L and 
Ak- a by-product of this result is an upper bound 


iy(o) _ ^ 


< 


.y;(0) _ ^ 


< (5||m|| < 6^/n = C 4 

which together with the fact — m||^ < rcmax ~ 'W^min < 1 gives 

< min < C 4 


log n 


PohsL 


( 68 ) 


^y(O) _ ^ 


logn 


Pobs A 


1 }• = min{c 4 ^max, 1} 


(69) 


This justihes that satishes the claim (165 p . Notably, is independent of and 
yitev therefore, independent of the iterative steps. 
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In what follows, we divide the iterative stage into two phases: (1) t <Tq and (2) t > Tq, 
where Tq is a threshold such that 


6 > CioCmin = CIO-I 


I logn 

^PohsL 


iff t<ro, 


(70) 


for some large constant cio >0. As is seen from the definition of Tq < logn holds as 
long as L = O (poly (n)). 

For the case where t < Tq, we proceed by induction on t w.r.t. the following hypotheses: 

• Ait- < holds at the iteration (the iteration where we compute 

^h+l)); 

• Bf. all entries of (r < t — 1) satisfying — Wi\ > 1.5^t have been replaced 
by time t; 

• Tit: none of the entries (t < t — 1) satisfying — Wi\ < have been replaced 
by time t. 


We first note that Bt is an immediate consequence of Ait and Bt-i- In fact, given Bt-i, it 

(t) 

suffices to examine those entries that have not been replaced by time t — 1. To this 

(t) I (t) 1 I 

end, we recall that Spectral MLE replaces w- Mf |t(;) — tc™ | > With Ait in place, for 

(t) 

each i obeying — Wi\> 1.5^t, one has 

\wP - - ^*1 - K'" - ^*1 > 1-56 - = it 

and hence it will necessarily be replaced by at time t. Similarly, T-Lt is an immediate 
consequence of AAt and 77t_ill As a consequence, it boils down to verifying Aif 
When t = 0, applying Theorem [8] and setting we see that 


r„mle _ ^ 


logn 

— ^Tsmin ' ^9 smax 

oo ^Pobs 


for some universal constants cj,cg > 0, where we have made use of the properties (1671) and 
(IMl) . When cio is sufficiently large, the definition of Tq (cf. (1701) 1 gives 

Y ^Pobs-^ 

additionally, < ?max < ^0 holds as long as is sufficiently small. Putting 

these conditions together gives 


- w 


log n 1 

^ Cy^min T C9C4 smax < XsOj 

npohs 2 


which verifies the property AAq. 

3 . Given Mt and Ht-i, for any i obeying — Wi\ < O.S^t, one has 

-Wi\ + IwP'® -Wi\<^^t + = 6 

and, therefore, it cannot be replaced by time t, which establishes Ht- 
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We now turn to extending these inductive hypotheses to the iteration, assuming that 
all of them hold up to time t — 1. Taken together Mt-i and Bt-i immediately reveal that 


w 


it) 


— w 




(71) 


In order to invoke Theorem [8] for the coordinate-wise MLEs, we need to construct a looser 
auxiliary score estimate With Bt-i, T-Lt-i and d7T|) in mind, we propose a candidate 
for the iteration as follow^ 


wf = 


Wi -t- 
(0) 


w. 


if |tc. 
else 


( 0 ) 


- Wi 


> Ut-l, 


(72) 


which is clearly independent of and According to Bt-i and Ht-i, (i) none of the 
entries with |t(;® — Wi\ < have been replaced so far; (ii) if an entry rc® has ever 

been replaced, then the error of the new iterate cannot exceed (otherwise it’ll be 

replaced by the MLE in time t — 1 which gives an error below 0.5^4_i). As a result, 
satisfies 


w. 


(i) 


- Wi 


< 


'ub 


Wi 


- Wi 


< 1.56- 


1) 


(73) 


and 


w 


it) _ 


w 


< 


w 


(ub) _ 


w 


(a) 

< 


W 


( 0 ) _ 


w 




(74) 


Here, (a) arises since: (1) due to T-Lt-i, if wf'^ is ever replaced, then — Wi\ is at least 
0.5^i_i; (2) by construction, the pointwise error of is at most and hence the 

replacement cannot inflate the original error — wA by more than = 3 times. 


With these in place, applying Theorem [8] gives 




- w 


log n 

— Cgsmin l-Scg St—1? 

^Pobs 


which relies on the fact 5 < 


log n 
^Pobs^ 


Recognize that 


log n 

6 > cs^min and l.Scg- (t-i < 6 

^Pohs 

hold in the regime where t <Tq and i2S21 which taken together give 

^Pobs 


w”*'’ - w 


W 

< 

oo Z 


as claimed in A4t. Having verified these inductive hypotheses, we see from the above 
argument that in any event, the l^o error bound at the iteration is at most 1.5^t, which 
in turn leads to the claim ([651) for any t <Tq. 

4. Careful readers will note that when — Wi\ > fh® resulting Wi'° might exceed the range 

[iCinin, Wmax]. This Can be easily addressed if we do the following: (1) change to Wi — instead 

if Wi — G [Wmin, Wmax]; (2) if it is Still infeasible, set to be Wmax if \wi — U)max| > IrCi — Wmin| 

and uimin otherwise. For simplicity of presentation, however, we omit these boundary situations and 
assume Wi + < iCmax throughout, which will not change the results anyway. 
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Starting from t = Tq + 1, we fix the auxiliary score as follows 




Wi + 1.5^To, if “ '“^*1 > 5^' 


w. 


(0) 


else, 


(75) 


where we recall that ^oo = C 3 .^min and = cioCmin- This satisfies 


(t) 

w] - Wi 


< 


— Wi 


< l.sero 


for t = Tq+I, due to M-Tq and Btq- Moreover, the number of indices that satisfy —Wi\ > 
^?cxD, denoted by k, obeys 


^ ■ ( 2^°° I — 


w — 


< mllwl 


k < 


4(5^||m| 

Soo 


which further gives 


-w 


< 


,(o) _ 


' — w 


< 1 + 


+ (l.S^To)^ — + 2.25k^^^ 

i\ |uil°'-lOi|>iAoo 


!Zo 

oo 


Note that the preceding analysis does not depend on the ratio ^ as long as both C 3 and 


cio are large. If we pick ^ < -\/2, then the above inequality gives rise to 


C3 


-w 


< Vl 9 ( 5 || 




Applying Theorem [ 8 ] we deduce 


- w 


npobs 


logn 


_ _ Jogr^ ^ 1^ 

npohsL V npohsL 2*’°°’ 


as long as is small and cio,C 3 are sufficiently large. 

Pobs^ 

The main point of the above calculation is that: for any entry tc® satisfying \wf’'^—Wi\ < 
^^ 00 ) one must have 


(0) _ mle 


< 


( 0 ) 

w'- - Wi 


+ 


(mle) 

W] - Wi 


^ ^00 < Ct) 


and hence it will not be replaced. As a result, the auxiliary score (I75h remains valid for the 
iteration that follows. In fact, these properties continue to hold for all f > Tq if we repeat 
the same argument as t increases. To finish up, put together the above arguments to obtain 


— w 


— r,^00 

OO Z 


logn 


5 t > Tq, 


^PobsT 

which establishes the claim (I65p for t > Tq and, in turn. Theorem [71 
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Appendix B. Proof of the Minimax Lower Bonnd (Theorem [3]) 

This section establishes the minimax lower limit given in Theorem [31 To bound the min¬ 
imax probability of error, we proceed by constructing a finite set of hypotheses, followed 
by an analysis based on classical Fano-type arguments. For notational simplicity, each hy¬ 
pothesis is represented by a permutation cr over [n], and we denote by a{i) and a ([A]) the 
corresponding index of the ranked item and the index set of all top-A items, respectively. 

We now single out a set of hypotheses and some prior to be imposed on them. Suppose 
that the values of w are fixed up to permutation in such a way that 




wri 1 < * < a, 
wr+i, K < i <n, 


where we abuse the notation wr,wrj^i to represent any two values satisfying 


WR - WR^, ^ ^ 0 . 

'U^max 

Below we suppose that the ranking scheme is informed of the values wr^wrj^i, which only 
makes the ranking task easier. In addition, we impose a uniform prior over a collection Ai 
of M := max {iF,n — A} -|- 1 hypotheses regarding the permutation: if iF < n/2, then 

F{a{[K])=S} = ^, for5 = {2,--- ,A}U{i}, (i = 1, A + 1, • • • , n); (76) 

if A > n/2, then 

F{a{[K])=S} = ^, for S = {!,■■■ ,K + l}\{i}, (i = 1, • • • , A + 1). (77) 

In words, each alternative hypothesis is generated by swapping two indices of the hypothesis 
obeying a {[K]) = [K], Denoting by Pe,M the average probability of error with respect to 
the prior we construct, one can easily verify that the minimax probability of error is at least 
Pe,M- 

This Bayesian probability of error will be bounded using classical Fano-type bounds. To 
accommodate partial observation, we introduce an erased version of j := (yj^\ • • • 
such that 


z 




Vij, with probability pobs, 
erasure, else, 
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and set Z := Applying the generalized Fano inequality (jHan and Verdu . 

19941 . Theorem 7) gives 


Pe,M > 1 - 


1 


1 


logM I M2 


Jlf2 KL (P2|(T=CT1 II ]PZ|0-=(T2) + 2 


fTl,(T2GA4 


(jd ^ _ 


log M 


I M2 '^^^(Pzi,j\cr=cri \\^Zij\a=a2'j +^Og2 

o'l,cr2€A4 i^j 


(b) 




1 


log A/ I M2 
1 


^ E I|l*„„k«0+los2 


Pobs 

fTl,(T2GM ij^j 

PohsL 


log M I M2 


cti,(T2GA4 


^ II 1 +log2 


Ti,,'k=o'i y\jW=o2 


(d) 1 

> 1 - 




-npohsL^K + log 2 L 


logM t vjP 

where KL(P || Q) denotes the KL divergence of Q from P. Here, (a) comes from the 
independence assumption of the (b) arises since Zij is an erased version of y^ j] (c) 

follows since yf'j (1 < I < L) are i.i.d.; and (d) arises from Lemma fTUl (see below). 
Consequently, one would have Pe > Pe,M > e if 


2u;^ 


w„ 


-npohsL^K < (1 - e) log M - log 2. 


Since |A4| = M > S, the above condition is necessarily satisfied when 


2wt 


w„ 


'-npohsLAj^ < (1 - e) log n - 2 


L < 


Wmin (l-e)logn-2 


2^^max npohsAj^ 

which hnishes the proof. 

Lemma 10 If wk,wk+i G [tC min . Wmax], then for any cti,(T 2 € Ai: 

VKLfp(i), ||P(i), 

^ \ Vi. " Vi. W=^2 w± 




(78) 


Pro of To start with, for any two m easures P ~ Bernoulli (p) and Q ~ Bernoulli {q ), one 
has (|van Erven and Harremc^ . 2014 . Eqn. (7)) 


KL(P II Q)<x^ (P II Q) = 


{P-Q) , (P-Q? (P-Q? 


+ 


q l-q q{l-q)' 

where (P || Q) denotes the divergence between P and Q. 


(79) 
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Recall that given a = ai (resp. a = cr 2 ), is Bernoulli distributed with mean 


ri := 


(i)0) 


- (resp. r 2 := 




0-2 O’) ' 


If we set (5 = ri — r 2 , then ([7^ yields 


KL 


k=f^2y ^ r-2 (1 - r2) ' 


5^ 


< 


Aw 


max r2 




where the last inequality follows since 


r2 (1 - r2) = 


^(T2(i)'“^o-2 (j) 


y ^min 


(^0-2(1) +'«^t72(i))^ ‘^'*^max 


By construction, conditional on any hypotheses (Ti,iT 2 G Ad, the distributions of are 
different over at most 2n locations. For each of these O (n) locations, our construction of 
Ad ensures that 


|(5| = |r2 - ri| < 


WK 


WK+1 


Wk - Wk+1 


< 


Wr, 


Wk + Wk+1 Wk + Wk+1 Wk + Wk+ 1 2Wmin 
As a result, the total contribution is bounded above by 


Ak- 


KL ( P ( 1 ) II P (i)i ] < 2n • ( max(5^ ] 




< 


2w 


2 — 

W ■ W 

mm mm 


4—nA^. 


Appendix C. Bernstein Ineqnality 

Our analysis relies on the Bernstein inequality. To simplify presentation, we state below a 
user-friendly version of Bernstein inequality. 

Lemma 11 Consider n independent random variables zi (1 < I < n), each satisfying 
\zi\ < B. For any a >2, one has 


n 

1- 

1_ 


1=1 

L;=i J 

\ 


n 

2a log n ^ E + 
1=1 


2a 


B log n 


(80) 


with probability at least 1 — 

This is an immediate consequence of the well-known Bernstein inequality 




/ = 1 


n 

L«=i 


> t > < 2 exp — 


2 '' 




(81) 
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